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Abstract 

Monte Carlo methods are often necessary for the implementation of optimal Bayesian estimators. 
A fundamental technique that can be used to generate samples from virtually any target probability 
distribution is the so-called rejection sampling method, which generates candidate samples from a 
proposal distribution and then accepts them or not by testing the ratio of the target and proposal densities. 
The class of adaptive rejection sampUng (ARS) algorithms is particularly interesting because they can 
achieve high acceptance rates. However, the standard ARS method can only be used with log-concave 
target densities. For this reason, many generalizations have been proposed. 

In this work, we investigate two different adaptive schemes that can be used to draw exactly from 
a large family of univariate probability density functions (pdf's), not necessarily log-concave, possibly 
multimodal and with tails of arbitrary concavity. These techniques are adaptive in the sense that every 
time a candidate sample is rejected, the acceptance rate is improved. The two proposed algorithms can 
work properly when the target pdf is multimodal, with first and second derivatives analytically intractable, 
and when the tails are log-convex in a infinite domain. Therefore, they can be appUed in a number of 
scenarios in which the other generalizations of the standard ARS faU. Two illustrative numerical examples 
are shown. 

Index Terms 

Rejection sampling; adaptive rejection sampUng; ratio of uniforms method; particle filtering; Monte 
Carlo integration; volatiUty model. 
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I. Introduction 

Monte Carlo methods are often necessary for the implementation of optimal Bayesian estimators 
and, several families of techniques have been proposed ^ [121 [HI [D that enjoy numerous applications. 
A fundamental technique that can be used to generate independent and identically distributed (i.i.d.) 
samples from virtually any target probability distribution is the so-called rejection sampling method, 
which generates candidate samples from a proposal distribution and then accepts them or not by testing 
the ratio of the target and proposal densities. 

Several computationally efficiencient methods have been designed in which samples from a scalar 
random variable (r.v.) are accepted with high probability. The class of adaptive rejection sampling 
(ARS) algorithms |[Tll is particularly interesting because high acceptance rates can be achieved. The 
standard ARS algorithm yields a sequence of proposal functions that actually converge towards the target 
probability density distribution (pdf) when the procedure is iterated. As the proposal density becomes 
closer to the target pdf, the proportion of accepted samples grows (and, in the limit, can also converge 
to 1). However, this algorithm can only be used with log-concave target densities. A variation of the 
standard procedure, termed adaptive rejection Metropolis sampling (ARMS) IITQI can also be used with 
multimodal pdf 's. However, the ARMS algorithm is based on the Metropolis-Hastings algorithm, so the 
resulting samples form a Markov chain. As a consequence, they are correlated and, for certain multimodal 
densities, the chain can be easily trapped in a single mode (see, e.g., [23]). 

Another extension of the standard ARS technique has been proposed in |[T4l. where the same method 
of flu is applied to T-concave densities, with T being a monotonically increasing transformation, 
not necessarily the logarithm. However, in practice it is hard to find useful transformations other than 
the logarithm and the technique cannot be applied to multimodal densities either. The method in [6| 
generalizes the technique of fl4l to multimodal distributions. It involves the decomposition of the T- 
transformed density into pieces which are either convex and concave on disjoint intervals and can be 
handled separately. Unfortunately, this decomposition requires the ability to find all the inflection points 
of the T-transformed density, which can be something hard to do even for relatively simple practical 
problems. 

More recently, it has been proposed to handle multimodal distributions by decomposing the log-density 
into a sum of concave and convex functions [13 |. Then, every concave/convex element is handled using a 
method similar to the ARS procedure. A drawback of this technique is the need to decompose the target 
log-pdf into concave and convex components. Although this can be natural for some examples, it can 
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also be very though for some practical problems (in general, identifying the convex and concave parts 
of the log-density may require the computation of all its inflection points). Moreover, the application 
of this technique to distributions with an infinite support requires that the tails of the log-densities be 
strictly concave. Other adaptive schemes that extend the standard ARS method for multimodal densities 
have been very recently introduced in |[22l |23l . These techniques include the original ARS algorithm as 
a particular case and they can be relatively simpler than the approaches in |6i or [1131 because they do 
not require the computation of the inflection points of the entire (transformed) target density. However, 
the method in ||23l and the basic approach in Il22ll can also break down when the tails of the target pdf 
are log-convex, the same as the techniques in ifTTl [T3l [T4l . 

In this paper, we introduce two different ARS schemes that can be used to draw exactly from a large 
family of univariate pdf's, not necessarily log-concave and including cases in which the pdf has log- 
convex tails in an infinite domain. Therefore, the new methods can be applied to problems where the 
algorithms of, e.g., CH [H [131 |22i are invalid. 

The first adaptive scheme described in this work was briefly suggested in Ell as alternative strategy. 
Here, we take this suggestion and elaborate it to provide a full description of the resulting algorithm. This 
procedure can be easy to implement and provide good performance as shown in a numerical example. 
However, it presents some technical requirements that can prevent its use with certain densities, as we 
also illustrate in a second example. 

The second approach introduced here is more general and it is based on the ratio of uniforms (RoU) 
technique |l3l[l7l|25l. The RoU method enables us to obtain a two dimensional region A such that drawing 
from the univariate target density is equivalent to drawing uniformly from A. When the tails of the target 
density decay as (or faster), the region A is bounded and, in such case, we introduce an adaptive 
technique that generates a collection of non-overlapping triangular regions that cover A completely. Then, 
we can efficiently use rejection sampling to draw uniformly from A (by first drawing uniformly from the 
triangles). Let us remark that a basic adaptive rejection sampling scheme based on the RoU technique 
was already introduced in llT9l [20l but it only works when the region A is strictly convex^ The adaptive 
scheme that we introduce can also be used with non-convex sets. 

The rest of the paper is organized as follows. The necessary background material is presented in Section 
[n] The first adaptive procedure is described in Section [In] while the adaptive RoU scheme is introduced 
in Section |IV[ In Section |V] we present two illustrative examples and we conclude with a brief summary 



'it can be seen as a RoU-based counterpart of the original ARS algorithm in LllJ . that requires the log-density to be concave. 
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and conclusions in Section |Vll 

II. Background 

In this Section we recall some background material needed for the remaining of the paper. First, in 



Sections II-A and II-B we briefly review the rejection sampling method and its adaptive implementation, 



respectively. The difficulty of handling target densities with log-convex tails is discussed in Section II-C 



Finally, we present the ratio of uniforms (RoU) method in Section II-D 



A. Rejection sampling 

Rejection sampling [3, Chapter 2] is a universal method for drawing independent samples from a target 
density po{x) > known up to a proportionality constant (hence, we can evaluate p{x) oc po{x)). Let 
exp{— M^(2;)} be an overbounding function for p{x), i.e., exp{— > p{x). We can generate 
i.i.d. samples from Po{x) according to the standard rejection sampling algorithm: 

1) Set i = 1. 

2) Draw samples x' from tt{x) oc exp{— and u' from U{0, 1), where U{0, 1) is the uniform pdf 
in [0,1]. 

^) cxp{%v{x)} — ^' ^^^^ ~ ^' set i = i + 1, else discard x' and go back to step 2. 

4) If i > then stop, else go back to step 2. 
The fundamental figure of merit of a rejection sampler is the mean acceptance rate, i.e., the expected 
number of accepted samples over the total number of proposed candidates. In practice, finding a tight 
overbounding function is crucial for the performance of a rejection sampling algorithm. In order to 
improve the mean acceptance rate many adaptive variants have been proposed IH |TT| \T3\ [I4j l22l . 

B. An adaptive rejection sampling scheme 

We describe the method in ll22l . that contains the the standard ARS algorithm of lOTI as a particular 
case. Let us write the target pdf as Po{x) oc p{x) = exp {— y(x; g)}, D C M, where D C M is the support 
if Po{x) and V{x;g) is termed the potential function of Po{x). In order to apply the technique of 1.22.1 . 
we assume that the potential function can be expressed ass the addition of n terms, 

V{x-g)^Y.^i{g,{x)), (1) 

i=l 

where the functions Vi, i = l,...,n, referred to as marginal potentials, are convex while every gi{x), 
i = 1, n, is either convex or concave. As a consequence, the potential V{x; g) is possible non-convex 
(hence, the standard ARS algorithm cannot be applied) and can have rather complicated forms. 
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The sampling method is iterative. Assume that, after the (t — l)-th iteration, there is available a set of mt 
distinct support points, St = {si, S2, • • • , Smt} C V, sorted in ascending order, i.e., si < S2 < • • • < Smt- 
From this set, we define rrit + l intervals of the form Iq = {—oo, si], Ik = [sfc, Sfc+i], k = 1, . . . , m-t — 1, 
and Irrit = [-Smtj+oo). For each interval X^, it is possible to construct a vector of linear functions 
^k{x) = [ri^k{x), ■ ■ ■ ,rn,k{x)] such that Vi{ri^k{x)) < Vi{gi{x)) (see |22|) and, as a consequence, 

V{x;rk) < V{x;g), for all x e Ik (2) 

and k = 0, ...,mt. Hence, it is possible to obtain exp{— y(x; r^)} > p{x), Vx G Ik- Moreover, the 
modified potential V{x; r^) is strictly convex in the interval Ik and, therefore, it is straightforward to 
build a linear function Wk{x), tangent to V{x; Vk), such that Wk{x) < V{x\ r^) and exp{— ^^(x)} > p{x) 
for all X E Xfc. 

With these ingredients, we can outline the generalized ARS algorithm as follows. 

1) Initialization. Set i = \, t = Q and choose mi support points. Si = {si, Sm^}. 

2) Iteration. For t > 1, take the following steps. 

• From St, determine the intervals Xq, . . . ,Imt- 

• For k = {),... , nit, construct rk, V{x; r^) and Wk{x). 

• Let Wt{x) = Wk{x), if X G Ik, k e {0, . . . , nit}. 

• Build the proposal pdf vrf(x) oc exp{—Wt{x)}. 

• Draw x' from the proposal iTt{x) and u' from U{0, 1). 

• If < exp{-w-t(x')} ' accept x(*) = x' and set St+i = St, mt+i = nit,i = i + l. 

• Otherwise, if u' > cxp{^Wt{x')} ' ^^^^ reject x', set St+i = StU {x'} and update nit+i = mt + 1. 

• Sort St+i in ascending order and increment t = t + l. lfi>N, then stop the iteration. 
Note that it is very easy to draw from iTt{x) because it consists of pieces of exponential densities. 
Moreover, every time we reject a sample x', it is incorporated into the set of support points and, as a 
consequence, the shape of the proposal TTt{x) becomes closer to the shape of the target Po{x) and the 
acceptance rate of the candidate samples is improved ll22l . 



C. Log-convex tails 



The algorithm of Section II-B breaks down when the potential function V{x; g) has both an infinite 
support (x G P = R) and concave tails (i.e, the target pdf Po{x) has log-convex tails). In this case, 
the function Wt{x) becomes constant over an interval of infinite length and we cannot obtain a proper 
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proposal pdf 7rt{x). To be specific, if V{x;g) is concave in the intervals (—00, si], [smt,+oo) or both, 
then wo{x), Wmtix), or both, are constant and, as a consequence, ex.p{—Wt{x)}dx = +00. 

This difficulty with the tails is actually shared by other adaptive rejection sampling techniques in the 
literature. A theoretical solution to the problem is to find an invertible transformation G : D — )• D* , where 
"D* C M is a bounded set ||3j [SI EU |23 . Then, we can define a random variable Y = G{x) with density 
q{y), draw samples y^^\ . . . , y^^^ and then convert them into samples x^^^ = ^ (y ),..., x^'^^ = 
G^^{y^'^^) from the target pdf Po{x) of the r.v. X. However, in practice, it is difficult to find a suitable 
transformation G, since the resulting density q(y) may not have a structure that makes sampling any 
easier than in the original setting. 

A similar, albeit more sophisticated, approach is to use the method of [6 |. In this case, we need to build 
a partition of the domain V with disjoint intervals V = \J^^T>i and then apply invertible transformations 
Tj : — M, i = 1, m, to the target function p{x). In particular, the intervals Di and Dm contain 
the tails of p{x) and the method works correctly if the composed functions (Ti op){x) and (T^ op){x) 
are concave. However, finding adequate Ti and Tm is not necessarily a simple task and, even if they 
are obtained, applying the algorithm of [6| requires the ability to compute all the inflection points of the 
target function p{x). 

D. Ratio of uniforms method 

The RoU method O [TTl |25| is a sampling technique that relies on the following result. 
Theorem 1: Let q{x) > be a pdf known only up to a proportionality constant. If (u, v) is a sample 
drawn from the uniform distribution on the set 

A = {(f,n) G . Q < ^ < ^q(^y/u)Y (3) 

then X = ^ is a sample form q{x). 
Proof: See [3 Theorem 7.1]. 

Therefore, if we are able to draw uniformly from A, then we can also draw from the pdf qo{x) oc q{x). 
The cases of practical interest are those in which the region A is bounded, and A is bounded if, and 
only if, both \/q{x) and x^J q(x) are bounded. Moreover, the function \/q{x) is bounded if, and only if, 
the target density qo{x) oc q{x) is bounded and, assuming we have a bounded target function q{x), the 
function x\/q{x) is bounded if, and only if, the tails of q{x) decay as or faster. Chung97,Wakefield91 
defining different region A that could be used for fatter tails. 
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Figure [T] (a) depicts a bounded set A. Note that, for every angle a e (— 7r/2, +7r/2) rad, we can 
draw a straight Une that passes through the origin (0,0) and contains points {vi,Ui) € A such that 
X = ^ = tan(a), i.e., every point {vi,Ui) in the straight Une with angle a yields the same value of x. 

From the definition of A, we obtain Ui < q{x) and Vi = xui < x^J q{x), hence, if we choose the point 
{v2,U2) that lies on the boundary of A, U2 = \/q{x) and V2 = xyjq{x). Moreover, we can embed the 
set A in the rectangular region 

TZ = [ {v , u):0<u'< sup a/ q{x), 

(4) 

inf x\/ q{x) <v'< s\xpx\/q{x) >, 

X X J 

as depicted in Fig. [T] (b). 

Once TZ is constructed, it is straightforward to draw uniformly from A by rejection sampling: simply 
draw uniformly from TZ and then check whether the candidate point belongs to A. 




Fig. 1. (a) A bounded region A and tlie straiglit line v — xu corresponding to the sample x = tan(a). Every point in the 
intersection of the line v = xu and the set A yields the same sample x. The point on the boundary, («2,ti2), has coordinates 
«2 ~ Xy/q(x) and U2 — \/ q{x). (b) If the two functions \fq{x) and x^q{x) are bounded, the set A is bounded and embedded 
in the rectangle TZ. 



III. Adaptive rejection sampling with log-convex tails 

In this section, we investigate a strategy proposed in fTT\ to obtain an adaptive rejection sampling 
algorithm that remains valid the tails of the potential function V{x\ g) are concave (i.e, the target pdf 
Po{x) has log-convex tails). 

Let us assume that for some j G {1, . . . , n}, the pdf defined as 

q{x) oc eKg{-Vj{gj{x))} (5) 
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is such that: (a) we can integrate q{x) over the intervals lo^^i, ■■■,^mt and (b) we can sample from the 
density q{x) restricted to every 1^. To be specific, let us introduce the reduced potential 

n 

F_,(x;g)^ Viigiix)), (6) 

attained by removing Vj{gj{x)) from V{x; g). It is straightforward to obtain lower bounds 7^ < V-j{x; g) 
by applying the procedure explained in the Appendix to the reduced potential V-j{x; g) in every interval 
2jt, k = 0, ...,mt. Once these bounds are available, we set = exp{— 7^} and build the piecewise 
proposal fiinction 

Lq exp{ - Vj {gj (x) ) } , Vx G Xq , 
7rt(x)cx.j Lkex.p{-Vj{gj{x))}, Vx G X^, (7) 

Notice that, for all x G Ik, we have > exp{— (x; g)} and multiplying both sides of this inequality 
by the positive factor exp{—Vj{gj{x))} > 0, we obtain 

Lfcexp{-Fj(c/j(x))} > exp{-y(x;g)}, Vx G Ik, 

hence TTt{x) is suitable for rejection sampling. 

Finally, note that 7rt(x) is a mixture of truncated densities with non-overlapping supports. Indeed, let 
us define the mixture coefficients 

Oik = Lk / q{x)dx (8) 
and normalize them as ak = otk/ Yll^=o ^k- Then, 

rrit 

M^) = ^akq{x)xk{x) (9) 

k=l 

where Xk{x) is an indicator function (Xfc(x) = 1 if x G and Xk{x) = if x ^ X^). The complete 
algorithm is summarized below. 

1) Initialization. Set i = 1, t = and choose mi support points. Si = {si, Smi}- 

2) Iteration. For i > 1, take the following steps. 

• From St, determine the intervals Xq, . . . ,Imt- 

• Choose a suitable pdf q{x) oc exp{—Vj{gj{x))} for some j G {1, n}. 

• For k = 0, . . . , m,t, compute the lower bounds < ^-jix, g), Vx G Ik, and set Lk = exp{— 7;^} 
(see the Appendix). 
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• Build the proposal pdf vrf(x) oc Lkq{x) for x € X^, /c G {0, . . . , mt}. 

• Draw x' from the proposal TTt{x) in Eq. (|9]l and u' from ZY(0, 1). 

• If < '^^P^~^£i^^^ 'g)) ^ then accept x^*) = x' and set St+i = St, mt+i = rrit, i = i + 1. 

• Otherwise, if u' > ^ '^^^ , then reject x', set St+i = StU{x'} and update m^+i = m^+l. 

• Sort St+i in ascending order and increment t = t + l. lfi>N, then stop the iteration. 
When a sample x' drawn from vr^ (x) is rejected, x' is added to the set of support points St+i = StLl{x'}. 

Hence, we improve the piecewise constant approximation of V-j{x; g) (formed by the upper bounds L^) 
and proposal pdf 7rf+i(x) oc Lk exp{—Vj{gj{x))}, Vx G Ik, becomes closer to the target pdf po{x). 

Figure [2] (a) illustrates the reduced potential exp{— y_j (x; g)} and its stepwise approximation 
Lk = exp{— 7fc} built using = 4 support points. Figure |2] (b) depicts the target pdf po{x) jointly 
with the proposal pdf ■Kt{x), composed by weighted pieces of the function exp{— ^^(^(^(x))} (shown, in 
dashed line). 

This procedure is feasible only if we can find a pair Vj, gj, for some j € {1, such that the pdf 

q{x) oc exp{-Vj{gj{x))} is 

• is analytically integrable in every interval I C V, given otherwise the weights ai,...,amt in Eq. Q 
cannot be computed in general, and 

• is easy to draw from when truncated into a finite or an infinite interval, since otherwise we cannot 
generate samples from it. 

Note that in order to draw from iTt{x) we need to be able to draw from and to integrate 
analytically truncated pieces of q{x). This procedure is possible only if we have on hand a suitable 
pdf q{x) oc exp{-Vj{gj{x))}. 

The technique that we introduce in the next section, based on ratio of uniforms method, overcomes 
these constraints. 

IV. Adaptive RoU scheme 

The RoU method can be a useful tool to deal with target pdf 's Po{x) oc p{x) = exp{— y(x; g)} where 
V{x; g) of the form in Eq. ([T]) in a infinite domain and with tails of arbitrary concavity. Indeed, as long 
as the tails of p{x) decay as 1/x^ (or faster), the region A defined by the RoU transformation is bounded 
and, therefore, the problem of drawing samples from the target density becomes equivalent to that of 
drawing uniformly from a finite region. 

In order to draw uniformly from A by rejection sampling, we need to build a suitable set "P ^ ^ for 
which we are able to generate uniform samples easily. This, in turn, requires knowledge of upper bounds 
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(a) (b) 

Fig. 2. (a) Example of the function exp{ — V-j (x; g)} and its stepwise approximation ~ exp{— 7^}, k = 0, ...,mt — 4, 
constructed witii tlie proposed teclinique using four support points St ~ {si, S2, S3, S4}. (b) Our target pdf Po{^) oc 
exp{ — V{x;g)} — exp{— V_j (x; g) — yjigji^))} obtained by multiplying the previous function exp{ — V__, (s; g)} times 
exp{ — Vj{gj{x))} (shown with dashed Une). The picture also shows the shape of the proposal density nt{x) consisting of 
pieces of the function exp{ — Vj{gj{x))} scaled by the constant values Lk = exp{— 7^}. 



Indeed, note that to apply the RoU 



for the functions \/p{x) and x^yp{x), as described in Section II-D 
transformation directly to an improper proposal ntix) oc exjp{Wt{x)}, with exp{Wt{x)} > p{x) and 
-Kt{x)dx — >• +00, provides us an unbounded region A so that this strategy is useless. 
In the sequel, we describe how to adaptively build a polygonal sets Vt (t = 1,2, ....) composed by 
non-overlapping triangular subsets that embed the region A. To draw uniformly from Vt, we randomly 
select one triangle (with probability proportional to its area) and then generate uniformly a sample point 
from it using the algorithm in the Appendix. If the sample belongs to A, it is accepted, and otherwise is 
rejected and the region Vt is improved by adding another triangular piece. 



In Section IV-A below, we provide a detailed description of the proposed adaptive technique. Details 



of the computation of some bounds that are necessary for the algorithm are given in Section IV-B and 
in the Appendix. 



A. Adaptive Algorithm 

Given a set of support points 

St = {Si, . . . ,SmJ, 

we assume that there always exists k' G {0, ...,mj} such that Sk' = 0, i.e., the point zero is included 
in the set of support points St- Hence, in an interval = [sk,Sk+i] the points Sfc and s^+i are both 
non-positive or both non-negative. 
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We also assume that we are able to compute upper bounds > ^/^(x), L^^^ > x^^p{x) (for x > 0) 
and > —xy/p{x) (for a; < 0) within every interval x e Ik = [s/cSfe+i], k = 0,...,mt, where 
lo = (-oo,si] and Im, = [sm,,+oo). 

1) Construction of Vt 5 A: Consider the construction in Figure |3] (a). For a pair of angles 
Ofc = arctan(sj!c) and a^+i = arctan(s/;4.i), we define the subset Ak = ^ n J'k, where J'^. is the 
cone with vertex at the origin (0, 0) and dehmited by the two straight lines that form angles ak and ak+i 
w.r.t. the u axis. Note that, clearly, A = U^^^Ak- 

For each k = 0, ...,mt the subset Ak is contained in a piece of circle Ck (Ak ^ C^) delimited by the 
angles ak, Ofe+i and radius 



rk 



4'^)' + f4'T' if Sk,Sk+i>0 



4'0' + (4'0'' if Sk,Sk+i<0 



(10) 



also shown (with a dashed line) in Fig. [3] (a). 

Unfortunately, it is not straightforward to generate samples uniformly from Ck, but we can easy draw 
samples uniformly from a triangle in the plane M^, as explained in the Appendix. Hence, we can choose 
an arbitrarily a point in the arc of circumference that delimits Ck (e.g., the point (L^^\ in Fig. [i] 
(a)) and calculate the tangent line to the arc at this point. In this way, we build a triangular region Tk 
such that Tk ^ Ck ^ Ak, with a vertex at (0, 0). 

We can repeat the procedure for every k = 0, ...,mt, with different angles ak = arctan(sfe) and 
afc+i = arctan(sfc+i), and define the polygonal region Vt — ^^IqT composed by non-overlapping 
triangular subsets. Note that, by construction, Vt embeds the entire region A, i.e., A <^Vt. 

Figure [3] summarizes the procedure to build the set Vt- In Figure |3] (a) we show the construction of 
a triangle Tk within the angles ak = arctan(s/(.), Ofc+i = arctan(sfc+i), using the upper bounds L^^^ 
and for a single interval x e Ik = [sk, Sk+i]- Figure [3](b) illustrates the entire region Vt = U^^gTfc 
formed by + 1 = 6 triangular subsets that covers completely the region A, i.e., A C Vt- 

2) Adaptive sampling: To generate samples uniformly in Vt, we first have to select a triangle 
proportionally to the areas \Tk\, k = 0, ...,mt- Therefore, we define the normalized weights 

A I'Tfc I /I 1 \ 

and then we choose a triangular piece by drawing an index k' € {0, ...,mt} from the probability 
distribution P{k) = Wk- Using the method in the Appendix, we can easily generate a point {v\u') 
uniformly in the selected triangular region Tk'- If this point {v',u') belongs to A, we accept the sample 
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(a) (b) 

Fig. 3. (a) A region A constructed by the RoU method and a triangular region Tk defining by the vertices vi, \2 and V3 built 
using the upper bounds L^^\ L^^^ for the functions ^Jp{x) and x^Jp(x) for all x £ Ik = [sk, Sfe+i]. The dashed line depicts 
the piece of circumference Ck with radius — \J {L^^^Y + {^^k^Y- The set Tk embeds the subset Ak = Ar\ Jk where 
J'k is the cone defined as v G J^k if and only if v = ^iV2 + 6*2 V3 and 61,62 > 0. (b) Construction of the polygonal region 
Vt = U^^gTfc using mt = 5 support points, i.e., St = {si,S2,S3 = 0,54,35}. Observe that each triangle Tk has a vertex at 
(0,0). The set Vt covers completely the region A obtained by the RoU method, i.e., AcVt- 



x' = v'/u' and set mt+i = mt, St+i = St and Vt = Vt- Otherwise, we discard the sample x' = v' /u' 
and incorporate it to the set of support points, St+i = StU {x'}, so that mt+i = rrit + l and the region 
Vt is improved by adding another triangle. 

3) Summary of the algorithm: The adaptive RS algorithm to generate samples from p{x) using 
RoU is summarized as follows. 

1) Initialization. Start with i = 1, t = and choose mi support points, Si = {si, Smi}- 

2) Iteration. For t>l, take the following steps. 

• From St, determine the intervals Xq, . . . ,2mj. 

• Compute the upper bounds L^^\ j = 1,2,3, for each k = 0, ...,mt (see Section IV-B and the 
Appendix). 



Construct the triangular regions Tk as described in Section IV-Al k = 0, mt. 
Calculate the area |7fc| of every triangle, and compute the normalized weights 

A \7k\ ^-nx 
Wk = ^nit iT-i (12) 

with A; = 0, ...,mt. 

Draw an index k' G {0,...,mt} from the probability distribution P{k) = Wk- 
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Generate a point {v',u') uniformly from the region (see the Appendix). 

If u' < ^Jp {^), then accept the sample x^*) = x' = ^, set i = i + 1, St+i = St and 

mt+i = mt. 



• Otherwise, if u' > yp{^), then reject the sample x' = ^, set St+i = St U {x'} and 
nit+i =mt + l. 

• Sort St+i in ascending order and increment t = t + l. lfi>N, then stop the iteration. 

It is interesting to note that the region Vt is equivalent, in the domain of x, to a proposal function 



vTf (x) formed by pieces of reciprocal uniform distributions scaled and translated (as seen in Section II-D i, 
i.e., 7rt{x) oc l/(AfcX + /3fc)^ in every interval X^., for some and /3k- 



B. Bounds for other potentials 

(i) 

In this section we provide the details on the computation of the bounds , j = 1, 2, 3, needed for 
the implementation of the algorithm above. 

We associate a potential V^^^ to each function of interest. Specifically, since p{x) oc exp{—V{x;g)} 
we readily obtain that 

V^ocexpj -yW(x;g)}, with V^'\x;g) ^ ^V{x;g), 
x^/p{x) (xexp{-V^^\x;g)}, with ^(^^(x; g) = ^^(x; g) - log(x), (x > 0), (13) 
-x^^ocexpj -y(3)(^.g)}^ with V^^\x;g)^^V{x;g)-log{-x),{x<0), 
respectively. 

Note that it is equivalent to maximize the functions ^yp{x), X\/p{x), —x\/p{x) w.r.t. x and to minimize 
the corresponding potentials V'^^\ j = 1,2,3, also w.r.t. x. As a consequence, we may focus on the 
calculation of lower bounds 7^''^ < V^^\x]g) in an interval x G X^, related to the upper bounds as 
L^P = exp{— 7^''^}, j = 1,2,3 and k = 0,...,mt. This problem is far from trivial, though. Even for 
very simple marginal potentials, Vi, i = 1, . . . ,n, the potential functions, V^^\ j = 1, 2, 3, can be highly 
multimodal w.r.t. x |[23l . 

In the Appendix we describe a procedure to find a lower bound 7^ for the potential V{x;g). We can 
apply the same technique to the function V^^\x;g) = ^V{x;g), associated to the function y^p(x), since 
yd) is a scaled version of the generalized system potential V{x;g). Therefore, we can easily compute 
a lower bound 7^.^^ < V^^\x;g) in the interval X^. 

The procedure in the Appendix can also be applied to find upper bounds for x^yp{x), with x > 0, and 
—X\/p{x) with X < 0. Indeed, recalling that the associated potentials are V^'^\x; g) = \V{x\ g) — log(x). 
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X > 0, and V^^^x; g) = ^V{x; g) — log(— x), x < 0, it is straightforward to realize that the corresponding 
modified potentials 

V^^Hx;rk)^\v{x;rk)-log{x), 

: (14) 

F(3)(^;rfc) 4 -y(x;rfc)-log(-x), 
are convex in Z^, since the functions — log(x) {x > 0) and — log(— x) (x < 0), are also convex. Therefore, 
it is always possible to compute lower bounds 7^^^ < V^'^\x;g) and 7^^^ < V^^\x;g). 
The corresponding upper bounds are L^j^^ = exp{— 7^''''}, j = 1, 2, 3 for all x G Ik- 

C. Heavier tails 

The standard RoU method can be easily generalized in this way: given the area 

Ap = [{v,u) ■.0<u< \p{v/uf)]^/^P^^'^ }, (15) 

if we are able to draw uniformly from it the pair {v, u) then x = is a sample form po{x) oc p{x). 

The region Ap is bounded if [p{x)]^^^^^^^ and x [p{x)Y^^^^^^ are bounded. It occurs when p{x) is 
bounded and the its tails decay as l/x^P^^'^^f or faster. Hence, for p > 1 we can handle pdf's with fatter 
tails than with the standard RoU method. 

In this case the potentials associated to our target pdf Po{x) oc exp{— y(x; g)} are 

VW(x;g)^^V{x;g), 

y(2)(a:;g) 4 -^Y(x;g) -log(x), for X > (16) 

y(3)(a;;g) 4 —P—yU-g) -log(-x), for X < 
p + l 

Obviouly, we can use the technique in the Appendix to obtain lower bounds and to build an extended 
adaptive RoU scheme to tackle target pdf's with fatter tails. Moreover, the constant parameter p affects 
to the shape of Ap and, as a consequence, also the acceptance rate of a rejection sampler. Hence, in 
some case it is interesting to find the optimal value of p that maximizes the acceptance rate. 

V. Examples 

In this section we illustrate the application of the proposed techniques. The first example is devoted to 



compare the performance of the first adaptive technique in Section III and the adaptive RoU algorithm in 



Section IV using an artificial model. In the second example, we apply adaptive RoU scheme as a building 
block of an accept/reject particle filter fT8\ for inference in a financial volatility model. In this second 
example the first adaptive technique cannot implemented. Moreover, note that the other generalizations 
of the standard ARS method proposed in 161 dH \T3\ [Ml cannot be applied in both examples. 
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A. Artificial example 

Let X be a positive scalar signal of interest, x € M^, with exponential prior, q{x) oc exp{— Ax}, A > 0, 
and consider the system with three observations, y = [yi,y2,y?^ G M"^^, 

yi = aexp(-62;) + y2 = c\og{dx + I) + "92, and y^ = {x - ef + -d^, (17) 

where t?2> are independent noise variables and a, b, c, d, e are constant parameters. Specifically, 
'di and ^2 are independent with generalized gamma pdf Tg{'di; ai, /3j) oc t?"' exp{— z = 1, 2, with 
parameters ai = 4, /3i = 2 and 02 = 2, /32 = 2, respectively. The variable ??3 has a Gaussian density 

A^(i93; 0,1/2) ocexp{-i?2}. 



Our goal is to generate samples from the posterior pdf p{x\y) oc piy\x)q{x). Given the system (17i 
our target density can be written as 

Po{x) = p{x\y) oc exp{-y(x; g)}, (18) 

where the potential is 

yix;g) ={yi - aexp{-bx))'^ - log[yi - a exp(-62;))^]+ 

+ (y2 - clog{dx + l) f - log[(y2 - clog{dx + 1))^]+ (19) 
+ (ys -{x- eff + Ax. 

We can interpreted it as 

V{x;g) = Vi{gi{x)) + V2{g2{x)) + VM^)) + Mgi{x)) (20) 

where the marginal potentials are Vi (t?) = -d"^ — log['i9^] with minimum at /xi = V2 {'&) = — log['!?^] 
with minimum at /i2 = 1, V^{'d) = with minimum at /X3 = and Vi{'d) = \\'d\ with minimum at 
/i4 = (we recall that A > and x € M+). Moreover, the vector of nonlinearities is 

?,{x) = [gi{x) = aexjp{-bx),g2{x) = clog{dx + l),g3ix) = (x - ef,g4{x) = x]. (21) 

Since all marginal potentials are convex and all nonlinearities are convex or concave, we can apply the 
proposed adaptive RoU technique. Moreover, since exp{— V4((74(x))} = exp{— Ax} is easy to draw from, 
we can also applied the first technique described in Section [in] 
Note, however, that 

1) the potential V{x;g) in Eq. ( [19) ) is not convex, 

2) the target function p{x\y) oc exp{— y(x;g)} can be multimodal, 

3) the tails of the potential V{x; g) are not convex and 
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4) it is not possible to study analytically the first and the second derivatives of the potential V{x;g). 
Therefore, the techniques in lITTl [6l [T3l HM and the basic strategy in [22] cannot be used. 



The procedure in Section III can be used because of the pdf q4,{x) oc exp{—V4{g4^{x))} = exp{— x} can 
be easily integrated and sampled, even if it is restricted in a interval Ik- In this case, the reduced potential 
is y_4(x;g) = Vi{gi{x)) + V2{g2{x)) + %{g3{x)) and the corresponding function exp{— y_4(x; g)} is 
the likelihood, i.e., p{y\x) = ex.p{—V-4{x;g)}, since qi{x) coincides with the prior pdf. Moreover, in 
order to use the RoU method we additionally need to study the potential functions V^^\x; g) = ^V{x; g) 
and V^"^^ {x; g) = ^V{x; g) — log(x). Since we assume x > 0, it is not necessary to study V^'^^ as described 
in Section ITV-Bl 

We set, e.g., a = -2, b = 1.1, c = -0.8, d = 1.5, e = 2, A = 0.2 and y = [2.314, 1.6, 2], and we start 
with the set of support points So = {0,2 — \/2, 2,2 + \/2} (details of the initialization step are explained 
in II22II ). Figure |4] (a) shows the posterior density p{x\y) and the corresponding normalized histogram 
obtained by the first adaptive rejection sampling scheme. Figure [4] (b) depicts, jointly, the likelihood 
function p{y\x) = exp{— y_4(x; g)} and its stepwise approximation exp{— 7^} Vx G Ik, k = 0, ...,mt. 



The computation of the lower bounds jk < V-4{x; g) has been explained in Appendix and Section III 



Figure |4] (c) depicts the set A (solid) obtained by way of the RoU method, corresponding to the 
posterior density p(x|y) and the region U^qT^ formed by triangular pieces, constructed as described in 



Section IV- A with rrit = 9 support points. 

The simulations shows that both method attain very similar acceptance rate^ In Figure [s] (a)-(b) 
are shown the curves of acceptance rates (averaged over 10,000 independent simulation runs) versus 



the first 1000 accepted samples using the technique described in Section III and the adaptive RoU 
algorithm explained in Section [IVj respectively. More specifically, every point in these curves represent 
the probability of accepting the i-th sample. We can see that the methods are equal efficient, and the 
rates converge quickly close to 1. 



B. Stochastic volatility model 

In this example, we study a stochastic volatility model where only the adaptive RoU scheme can be 
applied. Let be Xk G the volatility of a financial time series at time k, and consider the following 

^We define the acceptance rate Ri as the mean probability of accepting the i-th sample from an adaptive RS algorithm. 
In order to estimate it, we have run M = 10,000 independent simulations and recorded the numbers j = 1,...,M, 
of candidate samples that were needed in order to accept the i-th sample. Then, the resulting empirical acceptance rate is 
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(a) (b) (c) 

Fig. 4. (a) The target function Po{x) — p{x\y) and the normaUzed histogram obtained by the adaptive RoU scheme, (b) 
The function exp{ — Vl4(a;; g)} obtained using the reduced potential V-4{x; g) — V{x; g) — V4{g4,{x)) and its constant upper 
bounds exp{— 7fc}, k = 0,...,mt. In this example, the function p{y\x) — exp{ — V_4(a::; g)} coincides with the likelihood, 
(c) The set A corresponding to the target pdf p{x\y) using the RoU method and the region U^JiQ^Tk, formed by triangles, 
constructed using the second adaptive rejection sampling scheme. 
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Fig. 5. (a) The curve of acceptance rates (averaged over 10,000 simulations) as a function of the first 1000 accepted samples 
using the first adaptive scheme described in Section |lll] (b) The curve of acceptance rates (averaged over 10,000 simulations) 
as a function of the first 1000 accepted samples using the proposed adaptive RoU technique. 



State space system ||5l Chapter 91- |[T6l with scalar observation, G M, 

log(x|)=/31og(x2_J+l?2,,, ^^^^ 

yk = log(x|) + T?i,fc, 

where k £ N denotes discrete time, /3 is a constant, ■d2,k ~ -^(^92,fc; 0, o"^) is a Gaussian noise, i.e., 
p{'^2,k) oc exp{— -i?2 fc/2(T^}, while -di^k has a density p(t?i,fc) oc exp{'(?i^fc/2 — exp(i?i_fc)/2} obtained 
from the transformation i?i fc = log[i?Q ^] of a standard Gaussian variable ??o,fc ~ N{^Q^k', 0, 1). 
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Given the system in Eq. (22 1, the transition (prior) pdf 



r log(xi)-/31og(4_i) | 
cx exp i ^ > , (23) 



and the UkeUhood function 

Vk - log{xl) exp(yfe - log(x^)) 



p{yk\xk) ocexp 



exp 



yk-log{xl) ex-p{yk) - l/xj 



(24) 



2 2 

We can apply the proposed adaptive scheme to implement an particle filter. Specifically, let 
be a collection of samples from p{xk-i\yi:k^i)- We can approximate the predictive density as 

Pixk\yi:k-i) = J piXk\Xk-l)piXk-l\yi:k-l)dXk-l 

1 ^ 

- J^Ep(^^\^k-i) (25) 

i=l 

and then the filtering pdf as H 

1 ^ 

p{xk\yi:k) fxp(yfckfe)^^p(2;fc|2;^*li). (26) 

i=l 

If we can draw exact samples {x^*^}^]^ from (26 1 using a RS scheme, then the integrals of measurable 
functions /(/) = / f {xk)p{xk\yi-k)dxk w.r.t. to the filtering pdf can be approximated as /(/) w InU) = 
7f Si=i f{x[.^). 



To show it, we first recall that the problem of drawing from the pdf in Eq. (26 1 can be reduced to 
generate an index j G {1, N} with uniform probabilities and then draw a sample from the pdf 

^k^ ~ Po,j{xk) oc p{yk\xk)p{xk\x[^li)- (27) 

Note that, in order to take advantage of the adaptive nature of the proposed technique, one can draw first 
N indices ji, ...,jr, ■■■Jn, with € {1, N}, and then we sample Nj. particles from the same proposal, 
^i*"^ ~ PojXxk), 171 = 1, ...,Nr, where Nr is the number of times the index r G {1, A^} has been 
drawn. Obviously, Ni + N2 + ... + Nn = N. 

The potential function associated with the pdf in Eq. ([27]) is 



^^^'^'^^ = 2 + + 2^1 + 2^^ ^ 

where = /31og is a constant and g{xk) = [gi{xk) = log(x|), 92(2;^) = -log(a;^) + Ofc]- 



The potential function in Eq. (28 1 can be expressed as 

V{xk;g) = Vi{gi{xk))+V2{g2ixk)), (29) 
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where the marginal potentials are Vi(??i) = 5( — ^?i + expl-i?!}) and V2(??2) = 2^'&2- Note that in this 
example: 

• Since the potential V{xk]g) is in general a non-convex function and the study of the first and the 
second derivatives is not analytically tractable, the methods in ||6l [TTl cannot be applied. 

• Moreover, since the potential function V{xk;g) has concave tails, as shown in Figure |6] (a), the 
method in lfT3l and the basic adaptive technique described in ll22l cannot be used either. 

• In addition, we also cannot apply the first adaptive procedure in Section [III] because there are 
no simple (direct) techniques to draw from (and to integrate) the function ex.p{—Vi{gi{xk))} or 
eyip{-V2{g2{xk))}- 

But, since the marginal potentials Vi and V2 are convex and we also know the concavity of the 
nonlinearities gi{xk) and g2{xk), we can apply the proposed adaptive RoU scheme. Therefore, the 
used procedure can be summarized in this way: at each time step k, we draw N time from the set 
of index r G { 1 , • • • , N} with uniform probabilities and denote as A^^^ the number of repetition of index 



r (A^i + ... + Nn = N). Then we apply the proposed adaptive RoU method, described in Section IV-A 



to draw Nr particles from each pdf of the form of Eq. ( 27 1 with j = r and r = 1, ....,N. 

Setting the constant parameters as /3 = 0.8 and a = 0.9, we obtain an acceptance rate ^ 42% (averaged 
over 40 time steps in 10, 000 independent simulation runs). It is important to remark that it is not easy to 
implement a standard particle filter to make inference directly about x^ (not about log(xfc)), because it 



is not straightforward to draw from the prior density in Eq. (23 1. Indeed, there are no direct methods to 
sample from this prior pdf and, in general, we need to use a rejection sampling or a MCMC approach. 

Figure [6] (b) depicts 40 time steps of a real trajectory (solid line) of the signal of interest x^ generated 
by the system in Eq. (|22]), with /3 = 0.8 and a = 0.9. In dashed line, we see the estimated trajectory 
obtained by the particle filter using the adaptive RoU scheme with N = 1000 particles. The shadowed 
area illustrates the standard deviation of the estimation obtained by our filter. The mean square error 
(MSE) achieved in the estimation of x^ using N = 1000 particles is 1.48. 

The designed particle filter is specially advantageous w.r.t. to the bootstrap filter when there is a 
significant discrepancy between the likelihood and prior functions. 



VI. Conclusions 

We have proposed two adaptive rejection sampling schemes that can be used to draw exactly from a 
large family of pdfs, not necessarily log-concave. Probability distributions of this family appear in many 
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(a) 



(b) 



Fig. 6. (a) An example of potential function V{xk;g) defined in Eq. i28i, when Ok = I, Vk = "2 and a = 0.8. The right 



tail is concave, (b) An example of trajectory with 40 time steps of the signal of interest Xk- The solid line illustrates the real 
trajectory generated by the system in Eq. (j22| with /3 = 0.8 and a — 0.9 while the dashed line shows the estimated trajectory 
attained by the particle filter with TV = 1000 particles. The shadowed area shows the standard deviation of the estimation. 



inference problems as, for example, localization in sensor networks ifTSl l22l l23l . stochastic volatility (SI 
Chapter 9], IH or hierarchical models fU. Chapter 9], E H 13- 

The new method yields a sequence of proposal pdfs that converge towards the target density and, as 
a consequence, can attain high acceptance rates. Moreover, it can also be applied when the tails of the 
target density are log-convex. 

The new techniques are conceived to be used within more elaborate Monte Carlo methods. It enables, 
for instance, a systematic implementation of the accept/reject particle filter of lITSl . There is another 
generic application in the implementation of the Gibbs sampler for systems in which the conditional 
densities are complicated. Since they can be applied to multimodal densities also when the tails are log- 
convex, these new techniques has an extended applicability compared to the related methods in literature 
161 Ull [131 ISni 1221 ^ as shown in the two numerical examples. 



The first adaptive approach in Section III is easier to implement. However, the proposed adaptive 
RoU technique is more general than the first scheme, as show in our application to a stochastic volatility 
model. Moreover using the adaptive RoU scheme to generate a candidate sample we only need to draw two 
uniform random variates but, in exchange, we have to find bound for three (similar) potential functions. 
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VIII. Appendix 

A. Calculation of lower bounds 

In some cases, it can be useful to find a lower bound 7;^. € M such that 

7fc < miny(x;g), (30) 

X&Ik 

in some interval = [s^, Sfc+i]. 

If we are able to minimize analytically the modified potential V{x;rf^), we obtain 

7fc = mill V{x; r^) < min V{x; g). (31) 

If the analytical minimization of the modified potential V{x;rf^) remains intractable, since the modified 
potential V{x;rk) is convex \/x € Ik, we can use the tangent straight line Wk{x) to V{x;rk) at an 
arbitrary point x* G Ik to attain a bound. Indeed, a lower bound Vx G = [sk, Sfc+i] can be defined as 

7fc = min[wfc(sfc), Wfc(sfc+i)] < min V{x; r^) < min F(x; g). (32) 



The two steps of the algorithm in Section |II-B are described in the Figure [7] (a) shows an example 



of construction of the linear function Wk{x) in a generic interval Ik C V. Figure [v] (b) illustrates the 
construction of the piecewise linear function Wt{x) using three support points, rrit = 3. Function Wt{x) 
consists of segments of linear functions Wk{x). Figure [S] (b) depicts this procedure to obtain a lower 
bound in an interval Ik for a system potential V{x; g) (solid line) using a tangent line (dotted line) to 
the modified potential V{x;rk) (dashed line) at an arbitrary point x* G X^. 



B. Sampling uniformly in a triangular region 

Consider a triangular set T in the plane defined by the vertices vi, V2 and V3. We can draw 
uniformly from a triangular region |[24l . ||3l p. 570] with the following steps: 

1) Sample m ~ Ui[0, 1]) and U2 ~ U{[0, 1]). 

2) The resulting sample is generated by 

x' = vi min[ni, U2] + V2(l — max[ni, U2]) + V3(max[ni, U2] — min[ui, U2])- (33) 

The samples x' drawn with this convex linear combination are uniformly distributed within the triangle 
T with vertices vi, V2 and V3. 
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Fig. 7. (a) Example of construction of the linear function Wk{x) inside a generic interval T — [sk,Sk+i]- The picture shows 
a non-convex potential V{x; g) in solid line while the modified potential V(x; tk) is depicted in dashed line for Vx G I^. The 
linear function Wk{x) is tangent to V{x;Tk) at a arbitrary point x* £ Ik- (b) Example of construction of the piecewise linear 
function Wt{x) with three support points St — {s\, S2, Smt=3}. The modified potential V{x; Vk), for x £ Xk, is depicted with 
a dashed line. The piecewise linear function Wt(x) consists of segments of linear functions Wk{x) tangent to the modified 
potential V{x;rk)- 




Fig. 8. The picture shows the potential V{x;g), the modified potential V{x;rk) an the tangent line Wk{x) to the modified 
potential at an arbitrary point x* in Tk- The lower bound 7fe is obtained as 7fc = min[wk{sk) ,Wk(sk+i)\- In this picture, we 
have 7fc = Wk(sk). 
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